Apparatus, system and method for noise cancellation and communication for incubators and related devices

ABSTRACT

Systems, apparatuses and methods for integrating adaptive noise cancellation (ANC) with communication features in an enclosure, such as an incubator, bed, and the like. Utilizing one or more error and reference microphones, a controller for a noise cancellation portion reduces noise within a quiet area of the enclosure. Voice communications are provided to allow external voice signals to be transmitted to the enclosure with minimized interference with noise processing. Vocal communications from within the enclosure may be processed to determine certain characteristics/features of the vocal communications. Using these characteristics, certain emotive and/or physiological states may be identified.

RELATED APPLICATIONS

This application is a continuation-in part of U.S. patent applicationSer. No. 13/673,005 titled “Encasement for Abating Environmental Noise,Hand-Free Communication and Non-Invasive Monitoring and Recording” filedon Nov. 9, 2012, which is a continuation of U.S. patent application Ser.No. 11/952,250 titled “Electronic Pillow for AbatingSnoring/Environmental Noises, Hands-Free Communications, AndNon-Invasive Monitoring And Recording” by Sen M. Kuo filed Dec. 7, 2007,the contents of each is incorporated by reference in its entiretyherein.

BACKGROUND

The present disclosure relates to an electronic enclosure or encasementadvantageously configured for an incubator or similar device, whereexcessive noise may be an issue. In particular, the present disclosurerelates to an electronic enclosure including active noise control, andcommunication.

In U.S. patent application Ser. No. 11/952,250, referenced above andassigned to the assignee of the present application, techniques weredisclosed for abating noise, such as snoring, in the vicinity of a humanhead by utilizing Adaptive Noise Control (ANC). More specifically,utilizing a multiple-channel feed-forward ANC system using adaptive FIRfilters with an 1×2×2 FXLMS algorithm, a noise suppression system may beparticularly effective at reducing snoring noises. While noisesuppression is desirous for adult humans, special requirements may beneeded in the cases of babies, infants, and other life forms that mayhave sensitivity to noise.

Newborn babies, and particularly premature, ill, and low birth weightinfants are often placed in special units, such as neonatal intensivecare units (NICUs) where they require specific environments for medicalattention. Devices such as incubators have greatly increased thesurvival of very low birth weight and premature infants. However, highlevels of noise in the NICU have been shown to result in numerousadverse health effects, including hearing loss, sleep disturbance andother forms of stress. At the same time, an important relationshipduring infancy is the attachment or bonding to a caregiver, such as amother and/or father. This is due to the fact that this relationship maydetermine the biological and emotional ‘template’ for futurerelationships and well-being. It is generally known that healthyattachment to the caregiver through bonding experiences during infancymay provide a foundation for future healthy relationships. However,infants admitted to an NICU may lose such experiences in their earliestlife due to limited interaction their parents due to noise and/or meansof communication. Therefore, it is important to reduce noise levelinside incubator and increase bonding opportunities for NICU babies andtheir parents. In addition, there are advantages for newborns inside theincubators to hear their mothers' voice which can help release thestress and improve language development. Communicating with NICU babiescan also benefit the new mothers, such as, preventing postpartumdepression, improving bonding, etc.

Regarding communication, it would be advantageous to provide “cues” to acaregiver based on an infant's cry, so that the infant may beunderstood, albeit on a rudimentary level. These cues may beadvantageous for interpreting a likely condition of the infant via itsvocal communication. Unlike adults, the airways of newborn infants arequite different from those of adults. The larynx in newborn infants ispositioned close to the base of the skull. The high position of thelarynx in the newborn is similar to its position in other animals andallows the newborn human to form a sealed airway from the nose to thelungs. The soft palate and epiglottis provide a “double seal,” andliquids can flow around the relatively small larynx into the esophaguswhile air moves through the nose, through the larynx and trachea intothe lungs. The anatomy of the upper airways in newborn infants is“matched” to a neural control system (newborn infants are obligated nosebreathers). They normally will not breathe through their mouths even ininstances where their noses may be blocked. The unique configuration ofthe vocal tract is the reason for the extremely nasalized cry of theinfant.

From one perspective, the increasing alertness and decreasing crying aspart of the sleep/wakefulness cycle suggests that there may be abalanced exchange between crying and attention. The change fromsleep/cry to sleep/alert/cry necessitates the development of controlmechanisms to modulate arousal. The infant must increase arousal moregradually, in smaller increments, to maintain states of attention forlonger periods. Crying is a heightened state of arousal produced bynervous system excitation triggered by some form of perceived threat,such as hunger, pain, or sickness, or individual differences inthresholds for stimulation. Crying is modulated and developmentallyfacilitated by control mechanisms to enable the infant to maintainnon-crying states.

The cry serves as the primary means of communication for infants. Whileit is possible for experts (experienced parents and child carespecialists) to distinguish infant cries though training and experience,it is difficult for new parents and for inexperienced child care workersto interpret infant cries. Accordingly, techniques are needed to extractaudio features from the infant cry so that different communicated statesfor an infant may be determined. Cry Translator™, a commerciallyavailable product known in the art, claims to be able to identify fivedistinct cries: hunger, sleep, discomfort, stress and boredom. Anexemplary description of the product may be found in US Pat. Pub. No.2008/0284409, titled “Signal Recognition Method With a Low-CostMicrocontroller,” which is incorporated by reference herein. However,such configurations are less robust, provide limited information, arenot necessarily suitable for NICU applications, and do not provideintegrated noise reduction.

Accordingly, there is a need for infant voice analysis, as well as aneed to coupled voice analysis with noise reduction. Using an infant'scry as a diagnostic tool may play an important role in determininginfant voice communication, and for determining emotional, pathologicaland even medical conditions, such as SIDS, problems in developmentaloutcome and colic, medical problems in which early detection is possibleonly by invasive procedures such as chromosomal abnormalities, etc.Additionally, related techniques are needed for analyzing medicalproblems which may be readily identified, but would benefit from animproved ability to define prognosis (e.g., prognosis of long termdevelopmental outcome in cases of prematurity and drug exposure).

SUMMARY

Under one exemplary embodiment, an enclosure, such as an incubator andthe like, is disclosed comprising a noise cancellation portion,comprising a controller unit, configured to be operatively coupled toone or more error microphones and a reference sensing unit, wherein thecontroller unit processes signals received from one or more errormicrophones and reference sensing unit to reduce noise in an area withinthe enclose using one or more speakers. The enclosure includes acommunications portion, comprising a sound analyzer and transmitter,wherein the communication portion is operatively coupled to the noisecancellation portion, said communications portion being configured toreceive a voice signal from the enclosure and transform the voice signalto identify characteristics thereof.

In another exemplary embodiment, a method is disclosed for providingnoise cancellation and communication within an enclosure, where themethod includes the steps of processing signals, received from one ormore error microphones and reference sensing unit, in a controller of anoise cancellation portion to reduce noise in an area within the encloseusing one or more speakers; receiving internal voice signals from theenclosure; transforming the internal voice signals; and identifyingcharacteristics of the voice signals based on the sound analyzing.

In a further exemplary embodiment, an enclosure is disclosed comprisinga noise cancellation portion, comprising a controller unit, configuredto be operatively coupled to one or more error microphones and areference sensing unit, wherein the controller unit processes signalsreceived from one or more error microphones and reference sensing unitto reduce noise in an area within the enclose using one or morespeakers; a communications portion, comprising a sound analyzer andtransmitter, wherein the communication portion is operatively coupled tothe noise cancellation portion, said communications portion beingconfigured to receive a voice signal from the enclosure and transformthe voice signal to identify characteristics thereof; and a voice inputapparatus operatively coupled to the noise cancellation portion, whereinthe voice input apparatus is configured to receive external voicesignals for reproduction on the one or more speakers.

In still further exemplary embodiments, the communications/signalrecognition portion described above may be configured to transform thevoice signal from a time domain to a frequency domain, wherein thetransformation comprises at least one of linear predictive coding (LPC),Mel-frequency cepstral coefficients (MFCC), Bark-frequency cepstralcoefficients (BFCC) and short-time zero crossing. The communicationsportion may be further configured to identify characteristics of thetransformed voice signal using at least one of a Gaussian mixture model(GMM), hidden Markov model (HMM), and artificial neural network (ANN).In yet another exemplary embodiment, the enclosure described above mayinclude a voice input operatively coupled to the noise cancellationportion, wherein the voice input is configured to receive external voicesignals for reproduction on the one or more speakers, wherein the noisecancellation portion is configured to filter the external voice signalsto minimize interference with signals received from one or more errormicrophones and reference sensing unit for reducing noise in the areawithin the enclose.

BRIEF DESCRIPTION OF THE DRAWINGS

Other advantages will be readily appreciated as the same becomes betterunderstood by reference to the following detailed description whenconsidered in connection with the accompanying drawings wherein:

FIG. 1 is an exemplary block diagram of a controller unit under oneembodiment;

FIG. 2 is a functional diagram of an exemplary multiple-channelfeed-forward ANC system using adaptive FIR filters with the 1×2×2 FXLMSalgorithm under one embodiment;

FIG. 3 illustrates a wireless communication integrated ANC system 300,combining wireless communication and ANC algorithms for an enclosureunder one embodiment;

FIG. 4 illustrates a general multi-channel ANC system suitable for theembodiment of FIG. 3 under one embodiment;

FIG. 5 illustrates a general multi-channel ANC system combined with theexternal voice communication for an enclosure under one exemplaryembodiment;

FIGS. 6A and 6B illustrate spectra of error signals and noisecancellation before and after ANC for error microphones under oneexemplary embodiment;

FIG. 7 is a chart illustrating a relationship between a bit error rate(BER) and signal-to-noise ratios (SNR) under one exemplary embodiment;

FIG. 8 illustrates an exemplary MFCC feature extraction procedure underone exemplary embodiment;

FIG. 9 illustrates one effect of convoluting a power spectrum with a Melscaled triangular filter bank under one embodiment;

FIG. 10 illustrates an exemplary nonlinear Mel frequency curve under oneembodiment;

FIG. 11 illustrates an exemplary linear vector quantization (LVQ) neuralnetwork model Architecture under one embodiment; and

FIGS. 12A-D illustrate various voice feature identificationcharacteristics under one exemplary embodiment.

DETAILED DESCRIPTION

As is known from U.S. patent application Ser. No. 11/952,250, noisereduction may be enabled in an electronic encasement comprising anencasement unit (e.g., pillow) in electrical connection with acontroller unit and a reference sensing unit. The encasement unit maycomprise at least one error microphone and at least one loudspeaker thatare in electrical connection with the controller unit. Under a preferredembodiment, two error microphones may be used, positioned to be close tothe ears of a subject (i.e., human). The error microphones may beconfigured to detect various signals or noises created by the user andrelay these signals to the controller unit for processing. For example,the error microphones may be configured to detect speech sounds from theuser when the electronic encasement is used as a hands-freecommunication device. The error microphones may also be configured todetect noises that the user hears, such as snoring or otherenvironmental noises when the electronic encasement is used for ANC. Aquiet zone created by ANC is centered at the error microphones.Accordingly, placing the error microphones inside the encasement belowthe user's ears, generally around a middle third of the encasement, mayensure that the user is close to the center of a quiet zone that has ahigher degree of noise reduction.

Additionally, there may be one or more loudspeakers in the encasement,also preferably configured to be relatively close to the user's ears.More or fewer loudspeakers can be used depending on the desiredfunction. Under a preferred embodiment, the loudspeakers are configuredto produce various sounds. For example, the loudspeakers can producespeech sound when the electronic encasement acts as a hands-freecommunication device, and/or can produce anti-noise to abate anyundesired noise. In another example, the loudspeakers can produce audiosound for entertainment or masking of residual noise. Preferably, theloudspeakers are small enough so as not to be noticeable. There areadvantages to placing the loudspeakers relatively close to ears of auser, as the level of anti-noise generated by the loudspeakers ismaximized compared to configurations where loudspeakers are placed inmore remote locations. Lower noise levels also tend to reduce powerconsumption and reduce undesired acoustic feedback from the loudspeakersback to the reference sensing unit. The configurations described abovemay be equally applicable to enclosures, such as an incubator, as wellas encasements. Also, it should be understood by those skilled in theart that use of the term “enclosure” does not necessarily mean that anarea around noise cancellation is fully enclosed. Partial enclosures,partitions, walls, rails, dividers etc. are equally contemplated herein.

Turning to FIG. 1, the controller unit 14 is a signal processing unitfor sending and receiving signals as well as processing and analyzingsignals. The controller unit 14 may include various processingcomponents such as, but not limited to, a power supply, amplifiers,computer processor with memory, and input/output channels. Thecontroller unit 14 can be contained within an enclosure, discussed ingreater detail below (see FIG. 3), or it can be located outside of theenclosure. The controller unit 14 further includes a power source 24.The power source 24 can be AC such as a cord to plug into a wall socketor battery power such as a rechargeable battery pack. The embodiment ofFIG. 1 preferably has at least one input channel 32, where the number ofinput channels 32 may be equal to the total number of error microphonesin the enclosure and reference microphones in the reference sensingunit. The input channels 32 may be analog, and include signalconditioning circuitry, a preamplifier 34 with adequate gain, ananti-aliasing lowpass filter 36, and an analog-to-digital converter(ADC) 38. The input channels 32 receive signals (or noise) from theerror microphones and the reference microphones.

In the embodiment of FIG. 1, there may be at least one output channel40. The number of output channels 40 may be equal to the number ofloudspeakers in the enclosure. The output channels 40 are preferablyanalog, and include a digital-to-analog converter (DAC) 42, smoothing(reconstruction) lowpass filter 44, and power amplifier 46 to drive theloudspeakers. The output channels 40 are configured to send a signal tothe loudspeakers to make sound. Digital signal processing unit (DSP) 48generally includes a processor with memory. The DSP receives signalsfrom the input channels 32 and sends signals to the output channels 40.The DSP can also interface (i.e. input and output) with other digitalsystems 50, such as, but not limited to, audio players for entertainmentand/or for creating environmental sounds (e.g., waves, rainfall),digital storage devices for sound recording, communication interfaces,or diagnostic equipment. DSP 48 may also includes one or more algorithmsfor operation of the electronic enclosure.

Generally speaking, the algorithm(s) may controls interactions betweenthe error microphones, the loudspeakers, and reference microphones.Preferably, the algorithm(s) may be one of (a) multiple-channelbroadband feed-forward active noise control for reducing noise, (b)adaptive acoustic echo cancellation, (c) signal detection to avoidrecording silence periods and sound recognition for non-invasivedetection, or (d) integration of active noise control and acoustic echocancellation. Each of these algorithms are described more fully below.The DSP can also include other functions such as non-invasive monitoringusing microphone signals and an alarm to alert or call caregivers foremergency situations.

The reference sensing unit includes at least one reference microphone.Preferably, the reference microphones are wireless for ease ofplacement, but they can also be wired. The reference microphones areused to detect the particular noise that is desired to be abated and aretherefore placed near that sound. For example, if it is desired to abatenoises in an enclosure from other rooms that can be heard through adoor, the reference microphone may be placed directly on the door. Thereference microphone may advantageously be placed near a noise source inorder to minimize such noises near an enclosure. As will be described infurther detail below, an enclosure equipped with noise-cancellationhardware may be used for a variety of methods in conjunction with thealgorithms. For example, the enclosure can be used in a method ofabating unwanted noise by detecting an unwanted noise with a referencemicrophone, analyzing the unwanted noise, producing an anti-noisecorresponding to the unwanted noise in the enclosure, and abating theunwanted noise. Again, the reference microphone(s) may be placedwherever the noise to be abated is located. These reference microphonesdetect the unwanted noise and the error microphones 20 detect theunwanted noise levels at the enclosure's location, both referencemicrophones send signals to the input channels 32 of the controller unit14, the signals are analyzed with an algorithm in the DSP, and signalsare sent from the output channels 40 to the loudspeakers. Theloudspeakers then produce an anti-noise (which may be produced by ananti-noise generator) that abates the unwanted noise. With this method,the algorithm of multiple-channel broadband feed-forward active noisecontrol for reducing noise is used to control the enclosure.

The enclosure can also be used in a method of communication by sendingand receiving sound waves through the enclosure in connection with acommunication interface. The method operates essentially as describedabove; however, the error microphones are used to detect speech and theloudspeakers may broadcast vocal sounds. With this method, the algorithmof adaptive acoustic echo cancellation for communications may be used tocontrol the enclosure, as described above, and this algorithm can becombined with active noise control as well. The configuration for theenclosure may be used in a method of recording and monitoring disorders,by recording noises produced by within the enclosure with microphonesencased within a pillow. Again, this method operates essentially asdescribed above; however, the error microphones are used to recordsounds in the enclosure to diagnose sleep disorders. With this method,the algorithm of signal detection to avoid recording silence periods andsound recognition for non-invasive detection is used to control theenclosure.

The enclosure can further be used in a method of providing real-timeresponse to emergencies by detecting a noise with a reference microphonein an enclosure, analyzing the noise, and providing real-time responseto an emergency indicated by the analyzed noise. The method is performedessentially as described above. Certain noises detected are categorizedas potential emergency situations, such as, but not limited to, thecessation of breathing, extremely heavy breathing, choking sounds, andcries for help. Detecting such a noise prompts the performance ofreal-time response action, such as producing a noise with theloudspeakers, or by notifying caregivers or emergency responders of theemergency. Notification can occur in conjunction with the communicationsfeatures of the enclosure, i.e. by sending a message over telephonelines, wireless signal or by any other warning signals sent to thecaregivers. The enclosure may also be used in a method of playing audiosound by playing audio sound through the loudspeakers of the enclosure.The audio sound can be any, such as soothing music or nature sounds.This method can also be used to abate unwanted noise, as the audio soundmasks environmental noises. Also, by locating the loudspeakers insidethe enclosure, lower volume can be used to play the audio sound.

Turning to FIG. 2, an exemplary illustration is provided for performingMultiple-Channel Broadband Feed-forward Active Noise Control for anenclosure. In this example a multiple-channel feed-forward ANC system isconfigured with one reference microphone, two loudspeakers and two errormicrophones independently. The multiple-channel ANC system uses theadaptive FIR filters with the 1×2×2 FXLMS algorithm. The referencesignal x(n) is sensed by reference microphones in the reference sensingunit. Two error microphones (located in the pillow unit) obtain theerror signals e₁(n) and e₂(n), and the system is thus able to form twoindividual quiet zones centered at the error microphones that are closeto the ears of sleeper. The ANC algorithm used two adaptive filtersW₁(z) and W₂(z) to generate two anti-snores y₁(n) and y₂(n) to drive thetwo independent loudspeakers (also embedded inside the pillow unit).Ŝ₁₁(z), Ŝ₁₂(z), Ŝ₂₁(z), and Ŝ₂₂(z) are the estimates of the secondarypath transfer functions using both on-line or offline secondary pathmodeling techniques.

The 1×2×2 FXLMS algorithm may be summarized as follows:

y ₁(n)=w ₁ ^(T)(n)x(n), i=1,2  (1)

w ₁(n+1)=w ₁(n)+μ₁ [e ₁(n)x(n)*ŝ ₁₁(n)+e ₂(n)x(n)*ŝ ₂₁(n)]  (2)

w ₂(n+1)=w ₂(n)+μ₂ [e ₁(n)x(n)*ŝ ₁₂(n)+e ₂(n)x(n)*ŝ ₂₂(n)]  (3)

where w₁(n) and w₂(n) are coefficient vectors and μ₁ and μ₂ are the stepsizes of the adaptive filters W₁(z) and W₂(z), respectively, and ŝ₁₁(n),ŝ₂₁(n), ŝ₁₂(n) and ŝ₂₂(n) are the impulse responses of the secondarypath estimates Ŝ₁₁(z), Ŝ₁₂(z), Ŝ₂₁(z), and Ŝ₂₂(z) respectively.

Configurations directed to adaptive acoustic echo cancellation andintegration of active noise control with acoustic echo cancellation aredisclosed in U.S. patent application Ser. No. 11/952,250, and will notbe repeated here for the sake of brevity. However, it should beunderstood by those skilled in the art that the techniques describedtherein may be applicable to the present disclosure, depending on theneeds of the enclosure designer.

Turning to FIG. 3, one example of a wireless communication integratedANC system 300, combining wireless communication and ANC algorithms foran incubator enclosure is disclosed. Here, the ANC may be configured tocancel unwanted noises and the wireless communication can provide twoway communications between parents and infants. The embodiment of FIG. 3is preferably comprises a sound analysis and communications portion 301,including (1) a ANC portion (302, 305, 306, 311) for reducing externalnoise for the infant incubator, and (2) a wireless communication portion(303, 304) integrated with ANC system to provide communication betweeninfants and their parents or caregivers. In order to comfort infants,the desired speech signal, such as, mother's voice may be picked up inreceiver 302, processed and played to infant through the loudspeaker 311inside the incubator. The infant audio signals such as crying,breathing, and cooing, will be picked up by the error microphone insidethe incubator 310, processed, and played externally.

The noise abatement of system 300 may be viewed as comprising fourmodules or units including (1) a noise control acoustic unit, (2) aelectronic controller unit, (3) a reference sensors unit, and (4) acommunication unit. The noise control acoustic unit includes one or moreanti-noise loudspeakers 311, at least partially operated by anti-noisegenerator 306, and microphones (error microphone 307, and referencemicrophone 308), operatively coupled to an electronic controller whichmay be part of unit 306 and/or 301. The controller may include a powersupply and amplifiers, a processor with memory, and input/outputchannels for performing signal processing tasks. The reference sensingunit may comprise wired or wireless microphones (308), which can beplaced outside the incubator 310 for abating outside noise 311, oralternately on windows for abating environmental noises, or doors forreducing noise from other rooms, or on other known noise sources. Thewireless communication unit may include wireless or wired transmitterand receivers (302, 304) for communication purposes.

A general multi-channel ANC system suitable for the embodiment of FIG. 3is illustrated in FIG. 4, where the embodiment is configured with theassumption that there are J reference sensors (microphones), K secondarysources and M error sensors (microphones). The J channels referencesignals may be expressed as:

x(n)=[x ₁ ^(T)(n)x ₂ ^(T)(n) . . . x _(J) ^(T)(n)]^(T)

with x_(j)(n) is the jth-channel reference of signal of length L. Thesecondary sources have K channels, or

y(n)=[y ₁(n)y ₂(n) . . . y _(K)(n)]^(T),

where y_(k)(n) is the signal of kth output channel at time n. The errorsignals have M channels, or

e(n)=[e ₁(n)e ₂(n) . . . e _(M)(n)]^(T)

where e_(m)(n) is the error signal of mth error channel at time n. Boththe primary noise d(n) and the cancelling noise d′(n) are vectors with Melements at the locations of M error sensors.

Primary paths impulse responses (402) can be expressed by a matrix as

${P(n)} = \begin{bmatrix}{p_{11}(n)} & {p_{12}(n)} & \ldots & {p_{1J}(n)} \\{p_{21}(n)} & {p_{22}(n)} & \ldots & {p_{2J}(n)} \\\vdots & \vdots & \ddots & \vdots \\{p_{M\; 1}(n)} & {p_{M\; 1}(n)} & \vdots & {p_{MJ}(n)}\end{bmatrix}$

where p_(mj)(n) is the impulse response function from the jth referencesensor to the mth error sensor. The matrix of secondary path impulseresponse functions (405) may be given by

${S(n)} = \begin{bmatrix}{s_{11}(n)} & {s_{12}(n)} & \ldots & {s_{1K}(n)} \\{s_{21}(n)} & {s_{22}(n)} & \ldots & {s_{2K}(n)} \\\vdots & \vdots & \ddots & \vdots \\{s_{M\; 1}(n)} & {s_{M\; 2}(n)} & \ldots & {s_{MK}(n)}\end{bmatrix}$

where s_(mk)(n) is the impulse response function from the kth secondarysource to the mth error sensor. An estimate of S(n), denoted as Ŝ(n)(401) can be similarly defined.

Matrix A(n) may comprise feed-forward adaptive finite impulse response(FIR) filters impulse response functions (403), which has J inputs, Koutputs, and filter order L,

A(n)=[A ₁ ^(T)(n)A ₂ ^(T)(n) . . . A _(K) ^(T)(n)]^(T), where

A _(k)(n)=[A _(k,1) ^(T)(n)A _(k,2) ^(T)(n) . . . A _(k,J) ^(T)(n)]^(T), k=1,2, . . . , K

is the weight vector of the kth feedforward FIR adaptive filter with Jinput signals defined as

A _(k,j)(n)=[a _(k,j,1)(n)a _(k,j,2)(n) . . . a _(k,j,L)(n)]^(T),

which is the feed-forward FIR weight vector form jth input to kthoutput.

The secondary sources may be driven by the summation (406) of thefeed-forward and feedback filters outputs. That is

${y_{k}(n)} = {{\sum\limits_{j = 1}^{J}{{x_{j}^{T}(n)}{A_{k,j}(n)}}} = {{x^{T}(n)}{A_{k}(n)}}}$

The error signal vector measured by M sensors is

$\begin{matrix}{{e(n)} = {{d(n)} + {y^{\prime}(n)}}} \\{= {{d(n)} + {{S(n)}*\lbrack {{X^{T}(n)}{A(n)}} \rbrack}}}\end{matrix}$

where d(n) is the primary noise vector and y′(n) is the canceling signalvector at the error sensors.

The filter coefficients are iteratively updated to minimize a definedcriterion. The sum of the mean square errors may be used as the costfunction defined as

${\xi (n)} = {{\sum\limits_{m = 1}^{M}{E\{ {e_{m}^{2}(n)} \}}} = {{e^{T}(n)}{e(n)}}}$

The least mean square (LMS) adaptive algorithm (404) uses a steepestdescent approach to adjust the coefficients of the feed-forward andfeedback adaptive FIR filters in order to minimize ξ(n) as follows:

A(n+1)=A(n)−μ_(a) X′(n)e(n)

where μ_(a) and μ_(b) are the step sizes for feedforward and feedbackANC systems, respectively. In another embodiment, different values maybe used to improve convergence speed:

$\begin{matrix}\begin{matrix}{{X^{\prime}(n)} = \lbrack {{S(n)}*{X^{T}(n)}} \rbrack^{T}} \\{= \lbrack {\begin{bmatrix}{{\hat{s}}_{11}(n)} & {{\hat{s}}_{12}(n)} & \ldots & {{\hat{s}}_{1K}(n)} \\{{\hat{s}}_{21}(n)} & {{\hat{s}}_{22}(n)} & \ldots & {{\hat{s}}_{2K}(n)} \\\vdots & \vdots & \ddots & \vdots \\{{\hat{s}}_{M\; 1}(n)} & {{\hat{s}}_{M\; 2}(n)} & \ldots & {{\hat{s}}_{MK}(n)}\end{bmatrix}*\begin{bmatrix}{x(n)} & 0 & \ldots & 0 \\0 & {x(n)} & \ldots & 0 \\\vdots & \vdots & \ddots & 0 \\0 & 0 & \ldots & {x(n)}\end{bmatrix}^{T}} \rbrack^{T}} \\{{{that}\mspace{14mu} {is}}} \\{= \begin{bmatrix}{x_{11}^{\prime}(n)} & {x_{12}^{\prime}(n)} & \ldots & {x_{1M}^{\prime}(n)} \\{x_{21}^{\prime}(n)} & {x_{22}^{\prime}(n)} & \ldots & {x_{2M}^{\prime}(n)} \\\vdots & \vdots & \ddots & \vdots \\{x_{K\; 1}^{\prime}(n)} & {x_{K\; 2}^{\prime}(n)} & \ldots & {x_{KM}^{\prime}(n)}\end{bmatrix}}\end{matrix} & \; \\{\mspace{79mu} {{and}\begin{matrix}{\mspace{79mu} {{x_{km}^{\prime}(n)} = {s_{{mk}{(n)}}*{x(n)}}}} \\{= \begin{bmatrix}{s_{{mk}{(n)}}*{x_{1}^{T}(n)}} & {s_{{mk}{(n)}}*{x_{2}^{T}(n)}} & \ldots & {s_{{mk}{(n)}}*{x_{J}^{T}(n)}}\end{bmatrix}} \\{= \begin{bmatrix}{x_{{km}\; 1}^{\prime \; T}(n)} & {x_{{km}\; 2}^{\prime \; T}(n)} & \ldots & {x_{kmJ}^{\prime \; T}(n)}\end{bmatrix}}\end{matrix}}} & \;\end{matrix}$

The updated adaptive filter's coefficients can be expressed,

${A_{k}( {n + 1} )} = {{A_{k}(n)} - {\mu {\sum\limits_{m = 1}^{M}{{x_{km}^{\prime}(n)}{e_{m}(n)}}}}}$

and it can be further expended as

$\begin{matrix}{{A_{k,j}( {n + 1} )} = {{A_{k,j}(n)} - {\mu {\sum\limits_{m = 1}^{M}{{x_{km}^{\prime}(n)}{e_{m}(n)}}}}}} \\{= {{A_{k,j}(n)} - {\mu {\sum\limits_{m = 1}^{M}{\lbrack {{s_{mk}(n)}*{x_{j}(n)}} \rbrack {e_{m}(n)}}}}}}\end{matrix}$

In addition to noise reduction, the embodiment of FIG. 3 may beadvantageously configured to provide a level of communication for aninfant. In order to comfort infants, a desired audio signal, such as amother's voice is picked up by receiver 302, processed, and reproducedto an infant through the anti-noise loudspeaker 311 inside incubator310. In turn, infant audio signals such as crying, breathing, andcooing, will be picked up by the error microphone 307 inside incubator310, processed (303, 304), and reproduced via a separate speaker (notshown), where an emotional or physiological state may also be displayedvia visual or audio indicia (e.g., screen, lights, automated voice,etc.). This configuration may allow parents outside the NICU tocommunicate to and listen from the infant inside the incubator, thusimproves bonding for parents without visiting NICU with limited timeperiods.

Under one embodiment, direct-sequence spread spectrum (DS/SS) techniquesmay be used to conduct wireless communication. In another embodiment;orthogonal frequency-division multiplexing (OFDM) or ultra-wideband(UWB) techniques may be used. For DS/SS communications, each informationsymbol may be spread using a length-L spreading code. That is,

d(k)=v(n)c(n,l)  (7)

where v(n) is the symbol-rate information bearing voice signal, and c(n,l) is the binary spreading sequence of the nth symbol. In oneembodiment, c(n) is used instead of c(n, l) for simplicity. The receivedchip-rate matched filtered and sampled data sequence can be expressed asthe product of the chip-rate sequence d(k) and its spatial signature h,

p(k)=d(k)h  (8)

Within a symbol interval, after chip-rate processing received databecomes

r=p+w  (9)

where the L by 1 vector p contains signal of interest, and w is thewhite noise

An embodiment for combining/integrating ANC with the aforementionedcommunications is illustrated in FIG. 5. Here, voice signal v(n) isadded to the adaptive filter output y(n), then the mixed signalpropagates through the secondary path S(z) to generate anti-noise y′(n).At the quiet zone (309), the primary noise d(n) is canceled by theanti-noise, resulting in the error signal e_(v)(n) sensed by the errormicrophone, which contains the residual noise and the audio signal. Toavoid the interference of the audio on the performance of ANC, the audiosignal v(n) is filtered through the secondary-path estimate Ŝ(z) andsubtracted from e_(v)(n) to get the true error signal e(n) for updatingthe adaptive filter A(z).

Using a z-domain notations, E_(v)(z) can be expressed as

Ev(z)=D(z)−S(z)[Y(z)+V(z)],  (10)

Where the actual error signal E(z) may be expressed as

$\begin{matrix}\begin{matrix}{{E(z)} = {{{Ev}(z)} + {{\hat{S}(z)}{V(z)}}}} \\{= {{D(z)} - {{S(z)}\lbrack {{Y(z)} + {V(z)}} \rbrack} + {{\hat{S}(z)}{{V(z)}.}}}}\end{matrix} & (11)\end{matrix}$

Assuming that the perfect secondary-path model is available, i.e.,Ŝ(z)=S(z), we have

E(z)=D(z)−S(z)Y(z).  (12)

This shows that the true error signal is obtained in the integrated ANCsystem, where the voice signal is removed from the signal ev(n) pickedup by the error microphone. Therefore, the audio components won'tdegrade the performance of the noise control filter A(z). Thus, some ofthe advantages of the integrated ANC system are that (i) it providesaudio comfort signal from the wireless communication devices, (ii) itmasks residual noise after noise cancellation, (iii) it eliminates theinterference of audio on the performance of ANC system, and (iv) itintegrates with the existing ANC's audio hardware such as amplifiers andloudspeakers for saving overall system cost.

A multiple-channel ANC system such as the one illustrated in FIG. 5 wasevaluated with J=1, K=2 and M=2 when the primary noise is recordedincubator noise. The spectra of error signals before and after ANC atthe error microphones are illustrated in FIGS. 6A and 6B. It can be seenthat there is a meaningful reduction of the recorded incubator noisesover the entire frequency range of interest. Average noise cancellationwas found to be 30 dB at a first error microphone (FIG. 6A), and 35 dBat a second error microphone (FIG. 6B). For the wireless communicationsystem, a single user configuration was simulated and analyzed withRayleigh channel and the DS/SS signal uses Gold code of length L=15.FIG. 7 illustrates the BER vs. SNR results, where it can be seen thatthe results shows a good match with the analytical result.

In addition to the audio signals being transmitted from the infant'sincubator, sound analysis (303) can be performed on the emanating audiosignal (e.g., cry, coo, etc.) in order to characterize a voice signal.Although it does not have a conventional language form, a baby cry (andsimilar voice communication) may be considered a kind of speech signal,the character of which is non-stationary and time varying. Under oneembodiment, short time analysis and threshold method are used to detectthe pair of boundary points-start point and end point of each cry word.Feature extraction of each baby cry word is important in classificationand recognition, and numerous algorithms can be used to extractfeatures, such as: linear predictive coding (LPC), Mel-frequencycepstral coefficients (MFCC), Bark-frequency cepstral coefficients(BFCC), and some other frequency extraction of stationary features. Inthis exemplary embodiment, 10 order Mel-frequency cepstral coefficient(MFCC-10) having 10 coefficients is used as a feature pattern for eachcry word. It should be understood by those skilled in the art that othernumbers of coefficients may be used as well.

Once features are extracted, different statistical methods can beutilized to effect baby cry cause recognition, such as Gaussian MixtureModel (GMM), Hidden Markov Models (HMM), and Artificial Neural Network(ANN). In one embodiment discussed herein, ANN is utilized for baby crycauses recognition. ANN imitates how human brain neurons work to performcertain task, and it can be considered as a parallel processing networksystem with a large number of connections. ANN can learn a rule fromexamples and generalize relationships between inputs and outputs, or inother words, find patterns of data. A Learning Vector Quantization (LVQ)model can be used to implement the classification of multi-class issue.The objective of using LVQ ANN model for baby-cry-cause recognition isto develop a plurality (e.g., 3) feature patterns which representcluster centroids of each baby-cry-cause: draw attention cry, wet diapercry, and hungry cry, as an example.

With regards to baby cry classification and recognition techniques, babycry word boundary points detection may be advantageously employed. Aspeech signal of comprehensible length is typically a non-stationarysignal that cannot be processed by stationary signal processing methods.However, during a limited short-time interval, the speech waveform canbe considered stationary. Because of the physical limitation of humanvocal cord vibration, in practical applications 10-30 milliseconds (ms)duration interval may used to complete short-time speech analysis,although other intervals may be used as well. A speech signal may bethought of as comprising a voiced speech component with vocal cordvibration and an unvoiced speech component without vocal cord vibration.A cry word can be defined as the speech waveform duration between astart point and an end point of a voiced speech component. Voiced speechand unvoiced speech have different short-time characteristics, which canbe used to detect the boundary points of baby cry words.

Short-time energy (STE) is defined as the average of the square of thesample values in a suitable window, which may be expressed as:

${E(n)} = {\frac{1}{N}{\sum\limits_{m = 0}^{N - 1}\lbrack {{w(m)}{x( {n - m} )}} \rbrack^{2}}}$

where w(m) is the window coefficient correspond with signal sample, andN is window length. The most obvious difference is that voiced speechhas higher short-time energy (STE), but unvoiced speech has lower STE.In one embodiment, a Hamming window may be chosen as it minimizes themaximum side lobe in the frequency domain and can be described as:

${w(m)} = {{.54} - {{.46}{\cos ( \frac{2\pi \; m}{N - 1} )}}}$

As previously mentioned, short-time processing of speech may preferablytake place during segments between 10-30 ms in length. For a signals of8 kHz sampling frequency, a window of 128 samples (˜16 ms) may be used.STE estimation is useful as a speech detector because there is anoticeable difference between the average energy between voiced andunvoiced speech, and between speech and silence. Accordingly, thistechnique may be paired with short-time zero crossing for a robustdetection scheme.

Short-time zero crossing (STZC) may be defined as the rate at which thesignal changes sign. It can be mathematically described as:

${{Z(n)} = {\frac{1}{N}{\sum\limits_{m = 0}^{N - 1}{{{{sign}( {x( {n - m} )} )} - {{sign}( {x( {n - m - 1} )} )}}}}}},{where}$sign(x(m)) = 1, if x(m) ≥ 0 = −1, otherwise

STZC estimation is useful as a speech detector because there arenoticeable fewer zero crossings in voiced speech as compared withunvoiced speech. STZC is advantageous in that it is capable ofpredicting cry signal start and endpoints. Significant short-time zerocrossing effectively describes the envelope of a non-silent signal andcombined with short-time energy, can effectively track instances ofpotentially voiced signals that are the signals of interest foranalysis.

There are some false positive cries that may be detected, as not allsignals bounded by the STZC boundary contain cries. Large STZC envelopeswith low energy tended to contain cry precursors such as whimpers andbreathing events. Not all signals with non-negligible STE containedcries as well. Infant coughing events may be bounded by a STZC boundaryand contained a noticeable STE. In order to consistently pick up desiredcry events, a desired cry may be defined as a voiced segment ofsufficiently long duration. Two quantifiable threshold conditions thatare needed to be met to constitute a desired voiced may be:

-   -   1) Normalized energy >0.05 (To eliminate non-voiced artifacts        such as breathing/whimpering and to supersede cry precursors)    -   2) Signal envelope period >0.1 seconds (To eliminate impulsive        voiced artifacts such as coughing)

Returning back to STE processing, as baby cry signals may be downsampled from 44.1 kHz to 7350 Hz, a window length N may be chosen as128, which translates to a 17.4 ms short-time interval. In order todetect the boundary points of cry words by setting a proper thresholdvalue, the STE must be normalized into range from 0 to 1 by dividing themaximum STE value of whole duration. To eliminate unvoiced artifact oflow STE or very short duration high energy impulse, two quantifiablethresholds should be set to detect the cry word boundary points. Thosetwo threshold conditions are:

-   -   (1) Normalized STE>0.05 (to eliminate unvoiced artifact such as        whimper, breathing), and    -   (2) Interval between start point and end point of a cry        word >0.14 second (at least about 1024 signal samples to        eliminate impulsive voiced artifact such as coughing)        Those voiced speech component start points and end points can be        detected by normalized STE threshold, and some short duration        false cry words detected can be eliminated by interval        threshold.

Short-time segment of speech can be considered stationary. Stationaryfeature extraction techniques can be compartmentalized into eithercepstral based (taking the Fourier transform of the decibel spectrum) orlinear predictor (determining the current speech sample based on alinear combination of prior samples) based algorithms. In soundprocessing, the mel-frequency cepstrum (MFC) is a representation of theshort-term power spectrum of a sound, based on a linear cosine transformof a log power spectrum on a nonlinear mel-scale of frequency. Inpractical application of speech recognition, Mel-frequency cepstralcoefficients (MFCC) is considered the best characteristic parameterwhich is closest to the non-linear low and high frequency perception ofhuman ear.

In sound processing, the mel frequency cepstrum is a representation ofthe short-time power spectrum of a sound based on a linear cosinetransform of a log spectrum on a non-linear mel scale of frequency. Themel scale is a perceptual scale of pitches. It is based upon the humanperception of the separation on a scale of pitches. The reference of themel scale with standard frequency may be defined by 1000 Hz tone 40 dBabove the listeners threshold and is equivalent to a pitch of 1000 mels.What the mel frequency cepstrum provides is a tool that describes thetonal characteristics of a signal that is warped such that it bettermatches human perceptual hearing of tones (or pitches). The conversionbetween mel (m) and Hertz (f) can be described as

$m = {2595{{\log_{10}\lbrack {\frac{f}{700} + 1} \rbrack}.}}$

The mel frequency cepstrum may be obtained through the following steps.A short-time Fourier transform of the signal is taken in order to obtainthe quasi-stationary short-time power spectrum F(f)=F{f(t)}. Thefrequency portion of the spectrum is then mapped to the mel scaleperceptual filter bank with the equation above using 18 triangle bandpass filters equally spaced on the mel range of frequency F(m). Thesetriangle band pass filters smooth the magnitude spectrum such that theharmonics are flattened in order to obtain the envelope of the spectrumwith harmonics. This indicates that the pitch of a speech signal isgenerally not present in MFCC. As a result, a recognition system willbehave more or less the same when the input utterances are of the sametimbre but with different tones/pitch. This also serves to reduce thesize of the features involved, making the classification simpler.

The log of this filtered spectrum is taken and then the Fouriertransform of the log spectrum squared results in the power cepstrum ofthe signal, or

|F{log(|F(m)|²)}|².

At this point, the discrete cosine transform (DCT)

$X_{k} = {\sum\limits_{n = 0}^{N - 1}{x_{n}\mspace{14mu} {\cos \lbrack {\frac{\pi}{N}( {N + \frac{1}{2}} )k} \rbrack}}}$

of the power cepstrum is taken to obtain the MFCC, which may be used tomeasure audio signal similarity. The DCT coefficients are retained asthey represent the power amplitudes of the mel frequency cepstrum. Tokeep the codebook length similar, an n^(th) (e.g., 10^(th)) order MFCCmay be obtained. However, in addition to the MFCC, and in order to havea more similar basis in algorithm for comparison in featureclassification, the MFLPCC may be used as well. The power cepstrum maypossesses the same sampling rate as the signal, so the MFLPCC isobtained by performing an LPC algorithm on the power cepstrum in 128sample frames. The MFLPCC encodes the cepstrum waveform in a morecompact fashion that may make it more suitable for a baby cryclassification scheme.

An exemplary MFCC feature extract procedure is illustrated in FIG. 8.The procedure shown in the figure can be implemented step by steps asfollows:

-   -   Step 1. Take discrete Fourier transform (DFT) of signal 801,        where N points DFT can be expressed as follows:

${X(k)} = {\sum\limits_{n = 0}^{N - 1}{{x(n)}^{- \frac{{j2\pi}\; k}{N}}}}$

-   -   Step 2. Square each spectrum amplitude value 802 to get power        spectrum:

P(k)=|X(k)|²

-   -   Step 3. Convolute the power spectrum P(k) with a Mel scaled        triangular filter bank 803, which is shown in FIG. 9.

Again, for this example, the number of subband filters is 10, and P(k)are binned onto the mel scaled frequency using 10 overlapped triangularfilter. Here binning means that each P(k) is multiplied by thecorresponding filter gain and the results accumulated as energy in eachband. The relationship between frequency and Mel scale can be expressedas follows:

${{Mel}(f)} = {2595\mspace{14mu} {\log_{10}( {1 + \frac{f}{700}} )}}$

The resulting nonlinear Mel frequency curve is illustrated in FIG. 10.

-   -   Step 4. Take logarithm 804:

${L_{m} = {\log( {\sum\limits_{k = 0}^{N - 1}\; {{{X(k)}}^{2}{H_{m}(k)}}} )}},{0 \leq m < M}$

where N is the number of DFT points, and M=10.

-   -   Step 5. Take discrete cosine transform (DCT) 805 to get MFCC:

${C_{m} = {\sum\limits_{n = 0}^{M - 1}\; {L_{m}{\cos ( \frac{\pi \; {m( {n + 0.5} )}}{M} )}}}},{0 \leq m < M}$

where MFCC order M is 10.

In one embodiment, a Linear vector quantization (LVQ) neural networkmodel is used. A self organizing neural network has the ability toassess the input patterns presented to the network, organize itself tolearn from the collective set of inputs, and categorize them into groupsof similar patterns. In general, self-organized learning involves thefrequent modification of the network's synaptic weights in response to aset of input patterns. LVQ is such a self organizing neural networkmodel that can be used to classify the different baby cry causes. LVQmay be considered a kind of feed-forward ANN, and is advantageously usedin areas of pattern recognition or optimization.

Different baby-cry-causes may be assumed to have different featurepatterns; as such, the objective of classification is to determine ageneral feature pattern that is a kind of MFCC “codebook” from exampletraining feature data for a specific baby cry cause, such as “drawattention” cry, “need to change wet diaper” cry, “hungry” cry, etc.Subsequently the unknown cause baby cry may be recognized by finding outthe shortest distance between the input unknown cry word MFCC-10 featurevector and every class “codebook” respectively.

A LVQ algorithm may be used to complete a baby-cry-cause classification,where a plurality of baby-cry-causes may be taken into consideration(e.g., draw attention, diaper change needed, hungry, etc.). Thus, anexemplary LVQ neural network would have a plurality (e.g., 3) outputclasses which would corresponding to the main baby-cry-causes:

-   -   Class 1: Draw attention cry    -   Class 2: Diaper change needed cry    -   Class 3: Hungry cry

An exemplary LVQ architecture is shown in FIG. 11. The input vector inthis example is a 10-dimension cry word MFCC-10 feature which can beexpressed as:

X=[x ₁ x ₂ . . . x ₁₀]^(T)

where all the weights in response to the input vector and output classescan be expressed as:

$W = {\begin{bmatrix}W_{1} & W_{2} & W_{3}\end{bmatrix} = \begin{bmatrix}w_{11} & \ldots & w_{31} \\\vdots & \ddots & \vdots \\w_{110} & \ldots & w_{310}\end{bmatrix}}$

where W₁=[w₁ ₁ w₁ ₂ . . . w₁ ₁₀]^(T) represents the pattern “codebook”of draw attention cry, W₂=[w₂ ₁ w₂ ₂ . . . w₂ ₁₀]^(T) represents thepattern “codebook” of diaper change needed cry, and W₃=[w₃ ₁ w₃ ₂ . . .w₃ ₁₀]^(T) represents the pattern “codebook” of hungry cry.

The exemplary LVQ neural network model may be trained using the followssteps:

-   -   Step 1. Initialize all weight vectors W₁(0), W₂(0), and W₃(0)        choosing a cry word MFCC-10 from each baby cry cause class.        Initialize the adaptive learning step size

${{\mu (k)} = \frac{\mu (0)}{k}},{{\mu (0)} = 0.1},{and}$k = 1, 2, …  , N,

where N is the number of iteration.

-   -   Step 2. For each training input vector X_(i) perform step 3 and        step 4:    -   Step 3. Determine the weight vector index j such that the        Euclidean distance

∥X(k)−W _(j)(k)∥²

is minimal, and

C _(W) _(j) _((k)) =j.

-   -   Step 4. Update the appropriate weight vector W_(j)(k) as        follows:

$\quad\{ \begin{matrix}{{{W_{j}( {k + 1} )} = {{W_{j}(k)} + {{\mu (k)}\lbrack {{X(k)} - {W_{j}(k)}} \rbrack}}},{C_{W_{j}{(k)}} = C_{X{(k)}}}} \\{{{W_{j}( {k + 1} )} = {{W_{j}(k)} - {{\mu (k)}\lbrack {{X(k)} - {W_{j}(k)}} \rbrack}}},{C_{W_{j}{(k)}} \neq C_{X{(k)}}}}\end{matrix} $

Where C_(X(k)) is the known class index of input X at time k, forexample, if input X(k) is MFCC-10 of a hungry cry word, C_(X(k))=3.Preferably, only W_(j) is updated and the updating rule depends onwhether the class index of input pattern equals to the index j obtainedin Step 4.

-   -   Step 5. Repeat step 2, 3, 4, until k=N.        After finishing training, W₁(N), W₂(N), W₃(N) may be considered        the pattern “codebook” for three baby-cry-causes exemplified        above, respectively.

The “draw attention cry words,” “diaper change needed cry words,” and“hungry cry words” MFCC-10 features of 4 different babies areillustrated in FIGS. 12A-C, respectively. After numerous (e.g., 300)iterations, the value of weights vectors W₁, W₂, W₃ which present thecentroid of each different cause class are fixed, and the centroidcurves of each class are shown in FIG. 12D.

In another embodiment, linear predictive coding (LPC) may be utilized toobtain baby cry characteristics. In certain cases, the waveforms of twosimilar sounds will also show similar characteristics. If two infantcries have very similar waveforms, it stands to reason that they shouldpossess the same impetus. However, it is impractical to conduct a sampleby sample full comparison between cry signals due to the complexityinherent in having audio signals of around 1 second in length at asampling rate of 8 kHz. In order to improve the solution of the timedomain comparison of infant cry signals, linear predictive coding (LPC)is applied.

As mentioned previously, there may be two acoustic sources associatedwith voiced and unvoiced speech, respectively. Voiced speech is causedby the vibration of the vocal cords in response to airflow from the lungand this vibration is periodic in nature while unvoiced speech is causedby constrictions in the air tract resulting in random airflow. The basisof the source-filter model of speech is that speech can be synthesizedby generating an acoustic source and passing it through an all-polefilter. The linear predictive coding (LPC) algorithm produces a vectorof coefficients that represent a spectral shaping filter. An inputsignal to this filter is either a pitch train for voiced sounds, orwhite noise for unvoiced sounds. This shaping filter may be an all-polefilter represented as:

${{H(z)} = \frac{1}{1 - {\sum\limits_{i = 1}^{M}\; {a_{i}z^{- i}}}}},$

where {a_(i)} are the linear prediction coefficients and M is the numberof poles (the roots of the denominators in the z transform). A presentsample of speech may be represented as a linear combination of the pastM samples of the speech such that:

${{\hat{x}(n)} = {{{a_{1}{x( {n - 1} )}} + {a_{2}{x( {n - 2} )}} + \ldots + {a_{M}{x( {n - M} )}}} = {\sum\limits_{i = 1}^{M}\; {a_{i}{x( {n - i} )}}}}},$

where {circumflex over (x)}(n) is the predicted value of x(n).

The error between the actual and predicted signal can be defined as

${ɛ(n)} = {{{x(n)} - {\hat{x}(n)}} = {{x(n)} - {\sum\limits_{i = 1}^{M}\; {a_{i}{{x( {n - i} )}.}}}}}$

The smaller the error, the better the spectral shaping filter is atsynthesizing the appropriate signal. Taking the derivative of the aboveequation with respect to a_(i) and equating to 0 yields:

${\langle{{ɛ(n)},{x(n)}}\rangle} = {{\sum\limits_{i = 1}^{M}\; {{e\lbrack n\rbrack}{x\lbrack {n - 1} \rbrack}}} = 0}$

Minimization of error yields sets of linear equations in the form of theerror between the actual and predicted signal, expressed above. Toobtain the minimum mean square error, an autocorrelation method wherethe minimum is found by applying the principle of orthogonality as thepredictor coefficients that minimize the prediction error must beorthogonal to the past vectors.

$R = \begin{bmatrix}{R(0)} & {R(1)} & \ldots & {R( {n - 1} )} \\{R(1)} & {R(0)} & \ddots & \vdots \\\vdots & \ddots & \ddots & \vdots \\{R( {n - 1} )} & \ldots & \ldots & {R(0)}\end{bmatrix}$

This can be achieved by using a Toeplitz autocorrelation matrix R tofind the LPC parameters and using the Levinson-Durbin recursion to solvethe Toeplitz matrix.

Effectively, the purpose of LPCC is to take a waveform of a large sizein unit samples and then compress it into a more manageable form.Because similar waveforms should also result in similar acoustic output,LPC serves as a time domain measure of how close two different waveformsare.

Because of the sampling rate of 8 kHz and the generalization thatf/1000+2 LPC coefficients are the minimum required to decompose awaveform, 10 LPCC or LPC-10 may be used to describe each 128 sampleframe which corresponds to 16 ms and is assumed to be short-timestationary. Instead of computing the difference between windowedsegments of 128 samples in length, only comparisons of segments of theLPC-10 values are needed. Furthermore, during signal preprocessing, afirst order low pass filter can be used to brighten the signal such thatcomponents due to non-vocal tract speech can be attenuated.

In another embodiment, cepstrum analysis may be used to obtain baby crycharacteristics. To obtain the frequency spectrum F(w), a Fouriertransform, denoted by F{ }, must be performed on the time domain signalf (t) as F(w)=F{f(t)}. However, it is possible to take the Fouriertransform of the log spectrum as if it were a signal as well. The resultof this

|F{log(|F{f(t)}|²)}|².

The cepstrum provides information about the rate of change in thedifferent spectrum bands. This attribute can be exploited as a pitchdetector. For example, if the sampling rate of a cry signal is 8 kHz andthere is a large peak in the spectrum where the quefrency (x-axisfrequency analog in spectrum domain) is 20 samples, the peak indicatesthe existence of a pitch of 8000/20=400 hz. This peak occurs in thecepstrum because the harmonics in the spectrum are periodic, and theperiod corresponds to the pitch.

Cepstrum pitch determination is particularly effective because theeffects of the vocal excitation (pitch) and vocal tract (formants) areadditive in the logarithm of the power spectrum and thus clearlyseparate. This trait makes cepstrum analysis of audio signals morerobust than processing normal frequency or time domain samples. Anothertechnique used to improve the accuracy of feature extraction of cepstrumbased techniques is liftering. Liftering applies a low order low passfilter to the cepstrum in order to smooth it out and help with theDiscrete Cosine Transform (DCT) analysis for feature extractiontechniques in ensuing sections. Additionally, linear predictive cepstralcoefficients (LPCC) may be used for audio feature extraction. LPCCs maybe obtained by applying linear predictive coding on the cepstrum. Asmentioned above, the cepstrum is a measure of the rate of change inspectrum bands over windowed segments of individual cries. Applying LPCto the cepstrum yields a vector of values for a 10-tap filter that wouldsynthesize the cepstrum wave form.

Similar to the MFCC, the bark frequency cepstral coefficients (BFCC)warps the power cepstrum such that it matches human perception ofloudness. The methodology of obtaining the BFCC is similar to that ofthe MFCC except for two differences. The frequencies are converted tobark scale according to:

${b = {{13\mspace{11mu} {\tan^{- 1}( {{.00076}\mspace{11mu} f} )}} + {3.5\mspace{11mu} {\tan^{- 1}\lbrack ( \frac{f}{7500} )^{2} \rbrack}}}},$

where b denotes bark frequency and f is frequency in hertz. The mappedbark frequency is passed through a plurality (e.g., 18) of triangle bandpass filters. The center frequencies of these triangular band passfilters correspond to the first 18 of the 24 critical frequency bands ofhearing (where the band edges are at 20, 100, 200, 300, 400, 510, 630,770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400,5300, 6400, 7700, 9500, 12000 and 15500 Hz). This is done becausefrequencies above 4 kHz may be attenuated by the low pass anti-aliasingfilter described in signal preprocessing. This also allows for a morecomparable comparison between the MFLPCC and BFLPCC later on.

The BFCC is obtained by taking the DCT of the bark frequency cepstrumand the DCT coefficients describe the amplitudes of the cepstrum. Thepower cepstrum also possesses the same sampling rate as the signal, sothe BFLPCC is obtained by performing the LPC algorithm on the powercepstrum in 128 sample frames. The BFLPCC encodes the cepstrum waveformin a more compact fashion that may make it more suitable for a baby-cryclassification scheme.

In another exemplary embodiment, Kalman filters may be utilized for babyvoice feature extraction. One characteristic of analog generated sourcesof noise is that no two signals are identical. As similar as two soundsmay be, they will inherently vary to some degree in pitch, volume andintonation. Regardless, it can be said that adjoining infant cries arehighly similar and most likely have the same meaning. In order toestimate the true cry from the recorded cries, Kalman filter formulationmay be used.

If x(n) is arranged as an AR(p) (auto-regressive process of order p), itmay be generated according to

$\begin{matrix}{{x(n)} = {{\sum\limits_{k = 1}^{p}\; {{a(k)}{x( {n - k} )}}} + {{w(n)}.}}} & (A)\end{matrix}$

Supposing that x(n) is measured in the presence of additive noise, then

y(n)=x(n)+v(n)  (B)

If we let x(n) be the p-dimensional state vector

${x(n)} = \begin{bmatrix}{x(n)} \\{x( {n - 1} )} \\\vdots \\{x( {n - p + 1} )}\end{bmatrix}$

then (A) and (B) can be expressed in terms of x(n) as

$\begin{matrix}{{{x(n)} = {{\begin{bmatrix}{a(1)} & {a(2)} & \ldots & {a( {p - 1} )} & {a(p)} \\1 & 0 & \ldots & 0 & 0 \\0 & 1 & \ldots & 0 & 0 \\\vdots & \vdots & \ldots & \; & \vdots \\0 & 0 & \ldots & 1 & 0\end{bmatrix}{x( {n - 1} )}} + {\begin{bmatrix}1 \\0 \\0 \\\vdots \\0\end{bmatrix}{w(n)}}}}{and}} & (C) \\{{y(n)} = {{\lbrack {1,0,\ldots \mspace{14mu},0} \rbrack {x(n)}} + {v(n)}}} & (D)\end{matrix}$

Equations (C) and (D) can be simplified using matrix notation:

x(n)=Ax(n−1)+w(n)

y(n)=c ^(T) x(n)+v(n)  (E)

where A is a p×p state transition matrix, w(n)=[w(n), 0, . . . , 0]^(T)is a vector noise process and c is a unit vector of length p. Eventhough it is applicable primarily in stationary AR(p) processes, (D) canbe generalized to a non-stationary process by letting x(n) be a statevector of dimension p that evolves according to the difference equation

x(n)=A(n−1)x(n−1)+w(n)

where A(n−1) is a time varying p×p state transition matrix and w(n) is avector of zero-mean white noise processes and let y(n) be a vector ofobservations that are formed according to

y(n)=C(n)x(n)+v(n)

where y(n) is a vector of length q, C(n) is a time varying q×p matrixand v(n) is a vector of zero mean white noise processes that arestatistically independent of w(n).

It can be appreciated by those skilled in the art that the presentdisclosure provides innovative systems, apparatuses and methods forelectronic devices that integrate active noise control (ANC) techniquesfor abating environmental noises, with a communication system thatcommunicates to and from an infant. Such configurations may beadvantageously used for infant incubators, hospital beds, and the like.The wireless communication system can also provide communication betweeninfants to their parents/caregivers/nurses, patients/familymembers/nurses/physicians, and also provide intelligent digitalmonitoring that provide non-invasive detection and classification ofinfant's audio signals/other audio signals.

In the foregoing Detailed Description, it can be seen that variousfeatures are grouped together in a single embodiment for the purpose ofstreamlining the disclosure. This method of disclosure is not to beinterpreted as reflecting an intention that the claimed embodimentsrequire more features than are expressly recited in each claim. Rather,as the following claims reflect, inventive subject matter lies in lessthan all features of a single disclosed embodiment. Thus the followingclaims are hereby incorporated into the Detailed Description, with eachclaim standing on its own as a separate embodiment.

What is claimed is:
 1. An enclosure, comprising: a noise cancellationportion, comprising a controller unit, configured to be operativelycoupled to one or more error microphones and a reference sensing unit,wherein the controller unit processes signals received from one or moreerror microphones and reference sensing unit to reduce noise in an areawithin the enclose using one or more speakers; and a communicationsportion, comprising a sound analyzer and transmitter, wherein thecommunication portion is operatively coupled to the noise cancellationportion, said communications portion being configured to receive a voicesignal from the enclosure and transform the voice signal to identifycharacteristics thereof.
 2. The enclosure of claim 1, wherein thecommunications portion is configured to extract features from the voicesignal.
 3. The enclosure of claim 2, wherein the features comprise atleast one of linear predictive coding (LPC), Mel-frequency cepstralcoefficients (MFCC), Bark-frequency cepstral coefficients (BFCC).
 4. Theenclosure of claim 2, wherein the communications portion is configuredto identify characteristics of the features of voice signal using atleast one of a Gaussian mixture model (GMM), hidden Markov model (HMM),and artificial neural network (ANN).
 5. The enclosure of claim 1,wherein the characteristics of the voice signal comprise at least one ofan emotional or physiological state.
 6. The enclosure of claim 1,further comprising a voice input operatively coupled to the noisecancellation portion, wherein the voice input is configured to receiveexternal voice signals for reproduction on the one or more speakers. 7.The enclosure of claim 6, wherein the noise cancellation portion isconfigured to filter the external voice signals to minimize interferencewith signals received from one or more error microphones and referencesensing unit for reducing noise in the area within the enclose.
 8. Amethod for providing noise cancellation and communication within anenclosure, comprising: processing signals, received from one or moreerror microphones and reference sensing unit, in a controller of a noisecancellation portion to reduce noise in an area within the enclose usingone or more speakers; receiving internal voice signals from theenclosure; extracting features from the internal voice signals; andidentifying characteristics of the voice signals based on thetransformation.
 9. The method of claim 8, wherein the transformationtransforms the voice signal from a time domain to a frequency domain.10. The method of claim 9, wherein the features comprise at least one oflinear predictive coding (LPC), Mel-frequency cepstral coefficients(MFCC), Bark-frequency cepstral coefficients (BFCC) and short-time zerocrossing.
 11. The method of claim 9, wherein characteristic areidentified of the transformed voice signal using at least one of aGaussian mixture model (GMM), hidden Markov model (HMM), and artificialneural network (ANN).
 12. The method of claim 8, wherein thecharacteristics of the voice signal comprise at least one of anemotional or physiological state.
 13. The method of claim 8, furthercomprising the step of receiving an external voice signals from theenclosure for reproduction on the one or more speakers within theenclosure.
 14. The method of claim 13, wherein the signals are processedin the noise cancellation portion to filter the external voice signalsto minimize interference with the signals received from one or moreerror microphones and reference sensing unit to reduce noise in the areawithin the enclose.
 15. An enclosure, comprising: a noise cancellationportion, comprising a controller unit, configured to be operativelycoupled to one or more error microphones and a reference sensing unit,wherein the controller unit processes signals received from one or moreerror microphones and reference sensing unit to reduce noise in an areawithin the enclose using one or more speakers; a communications portion,comprising a sound analyzer and transmitter, wherein the communicationportion is operatively coupled to the noise cancellation portion, saidcommunications portion being configured to receive a voice signal fromthe enclosure and transform the voice signal to identify characteristicsthereof; and a voice input apparatus operatively coupled to the noisecancellation portion, wherein the voice input apparatus is configured toreceive external voice signals for reproduction on the one or morespeakers.
 16. The enclosure of claim 15, wherein the communicationsportion is configured to extract features from the voice signal.
 17. Theenclosure of claim 16, wherein the feature comprises at least one oflinear predictive coding (LPC), Mel-frequency cepstral coefficients(MFCC), Bark-frequency cepstral coefficients (BFCC) and short-time zerocrossing.
 18. The enclosure of claim 16, wherein the communicationsportion is configured to identify characteristics of the features of thevoice signal using at least one of a Gaussian mixture model (GMM),hidden Markov model (HMM), and artificial neural network (ANN).
 19. Theenclosure of claim 15, wherein the characteristics of the voice signalcomprise at least one of an emotional or physiological state.
 20. Theenclosure of claim 15, wherein the noise cancellation portion isconfigured to filter the external voice signals to minimize interferencewith signals received from one or more error microphones and referencesensing unit for reducing noise in the area within the enclose.